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Abstract 

Recently discovered identities in statistical mechanics have enabled the calculation of equilibrium 
ensemble averages from realizations of driven nonequilibrium processes, including single-molecule 
pulling experiments and analogous computer simulations. Challenges in collecting large data sets 
motivate the pursuit of efficient statistical estimators that maximize use of available information. 
Along these lines, Hummer and Szabo developed an estimator that combines data from multiple 
time slices along a driven nonequilibrium process to compute the potential of mean force. Here, we 
generalize their approach, pooling information from multiple time slices to estimate arbitrary equi- 
librium expectations. Our expression may be combined with estimators of path-ensemble averages, 
including existing optimal estimators that use data collected by unidirectional and bidirectional 
protocols. We demonstrate the estimator by calculating free energies, moments of the polymer 
extension, and the metric tensor for thermodynamic length in a model single-molecule pulling ex- 
periment. Compared to estimators that only use individual time slices, our multiple time-slice esti- 
mators yield substantially smoother estimates and achieve lower variance for higher-order moments. 
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I. INTRODUCTION 



When a system is driven out of equilibrium by a time-dependent external potential, the 
probability of finding it at a particular position in phase space generally differs from the equi- 
librium probability corresponding to the instantaneous thermodynamic state, a phenomenon 
known as lag.— With an appropriate reweighting of the nonequilibrium density, however, it 
is possible to recover an equilibrium distribution.- Ensemble averages can be computed by 
exploiting this fact; the expected value of a phase-space dependent function, weighted by 
the dissipated work, is equal to an equilibrium average of the same quantity (Eq. [S]).- 1 ^ This 
relationship, which is relevant to analyzing single-molecule pulling experiments and analo- 
gous computer simulations, has been applied to estimating various equilibrium properties of 
real and simulated systems. (See Ref.- for a brief survey.) 

Many asymptotically unbiased statistical estimators may be developed from this identity. 
While these expressions will yield the same estimate in the limit of infinite sampling, their 
properties — such as bias, variance, and smoothness — will differ when applied to finite data. 
Due to challenges in collecting large data sets, it is preferable to use statistically efficient 
estimators that have minimal bias and variance, and therefore make the best possible use of 
available information. 

An implication of Eq. [3] is that equilibrium expectations may be estimated using data 
from any time along a driven nonequilibrium process. It is reasonable to surmise, however, 
that estimates of many properties will be improved by using data from all recorded temporal 
observations, or time slices. For example, in a single-molecule pulling experiment, estimating 
the potential of mean force as a function of molecular extension typically involves creating 
a histogram of observed extensions. The variance in this estimate can be enormous if the 
nonequilibrium density within a histogram bin is small. Only by using a weighting scheme 
to combine data from multiple time slices were Hummer and Szabo able to produce a well- 
behaved estimator for the potential of mean force.- 

The issue of stability (both numerical and statistical) in estimation from multiple time 
slices is surprisingly important. Oberhofer and Dellago^ explored variations on Hummer and 
Szabo's weighting scheme,- deriving an alternative form which achieves lower variance in the 
limit of infinite sampling. Unfortunately, correlations between time slices and the difficulty 
of accurately estimating covariance matrices from practically-sized samples, however, led 
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to large fluctuations in the weights, and hence an unstable estimator. In contrast, while 
Hummer and Szabo's approach^ is not asymptotically efficient, its balance between efficiency 
and robustness led to superior estimates in nearly all studied cases.- 

Here, we present a previously unrecognized generalization of Hummer and Szabo's ap- 
proach, applicable to the estimation of arbitrary equilibrium expectations. This generaliza- 
tion allows for multiple time-slice estimators to be constructed from any existing estimator 
for path-ensemble averages, such as optimized forms for unidirectional (the sample mean) 
or bidirectional data.— We then compare single and multiple time-slice forms in estimating 
free energies, moments of the polymer extension, and the thermodynamic length^— in a 
model single-molecule pulling experiment. 

II. THEORY 

Consider a system evolving according to dynamics in which the stationary distribution 
of a configuration x is given by 



the unnormalized density q\(x) = e~ Ux ^ depends on the reduced potential u\(x) (in which 
P — {^bT)' 1 is absorbed into the potential) and satisfies q\(x) > for all x £ T, and A is a 
vector of one or more parameters that define the thermodynamic state. 

Now suppose that the thermodynamic parameters A are varied in time according to the 
protocol A = {A ,...,At} over t £ {0,...,T}. At each time slice t, the system evolves 
according to dynamics which preserve the distribution ir\ t . For notational convenience, we 
henceforth write ir t (x) instead of -K\ t (x), u t (x) instead of it\ t (x), and Z t instead of Z\ t . 

Let E'o-yf^] denote the nonequilibrium expectation of a path functional A[X] over all 
possible realizations X = {xq, ...,xt} of a process starting with x drawn from the equilib- 
rium distribution 7r (x). We also define the equilibrium expectation of a function A(x) with 
respect to the equilibrium density 7r t (x) as E t [A] = J r dx A(x)7r t (x). With these definitions, 



7T A (x) = Z x q x {x) 



(1) 



where the partition function Z\ is, 




(2) 
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the following identity holds for all £ G {0, ... , T}: 

E t [A] = E ^t[A(x t )e- w ^}^, (3) 
where u)o->4^] denotes the appropriate work function for the switching process,- 1 ^ 3 - 

T 

wo-+ t [X] = 22 [ u t( x t) - ut-i(x t )} ■ (4) 
t=i 



Depending on how paths are sampled, the nonequilibrium expectation i?o->-t[x4] may be 
estimated via a number of methods. In this paper, we use the notation £ -^|y4] to denote 
an estimator of the nonequilibrium path expectation i?o->-t [*4] that makes use of finite data, 
St [A] an estimator for the equilibrium expectation and Z\ to denote an estimator of 

Z\ up to an arbitrary multiplicative constant that is identical for all A. 



Suppose Nf paths are sampled from a single protocol. With these sampled paths de- 
noted as Xf n , n = 1, Nf, the most appropriate estimator is the sample mean of the path 
functional over the sampled paths, 



N 



f 



^t[A] = ^J2A[X fn ]. (5) 

f n=l 

When paths are also sampled according to the reverse process, A = {Ao,...,At} = 
{At, Ao}, and the dynamics at fixed A satisfy detailed balance, the estimator,- 

S \A] = V* A[X fn ] A[X rm ] 

has been shown to be asymptotically efficient^ when the ratio Zq/Zt is estimated by choosing 
A[X] = 1, which yields the well-known Bennett acceptance ratio.— ^ Here, X rm denotes 
the time reversa^^- of a path generated using the protocol A, indexed according to m = 
1, N r . Optimal estimators relevant to trajectories sampled from multiple path-ensembles 
have also been described.- 
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A. Single time-slice estimators 



Note that for any time slice t, we can obtain an expectation with respect to an arbitrary 
7r*(x) using the importance sampling identity, 



E* [A] = E t 



A{x 



vr„(x) 

7T t (x) 



(7) 



Substituting Eq. El this may be expressed in terms of an average over nonequilibrium paths 



as. 



E*[A] = % 



0-H 



A(x t )e 



qt{x t ) 



Zo 



(8) 



Replacing the expectations with their corresponding estimators, we obtain, 



£*[A] — So^t 



A(x t ) e 



W - 



H q*(zt) 



(9) 



Qt(x t )j Z* 

Use of this expression to estimate arbitrary expectations requires an estimate of the 

unknown ratio Z / ' Z*. While in theory there exist several means of estimating this ratio, 
one important criterion for choosing an estimator is the self-consistency of Eq. [9j it is 
necessary for estimates of constant functions A{x) = C to yield the same value, C. As not 
all estimators for Zq/ Z* will properly balance the weighing factors in Eq. [9] and satisfy this 
criterion, there is a constraint on possible estimates of the ratio. Fortunately, the choice 
A(x) = 1 in Eq. [8] leads to the convenient estimator, 



Z* _ P 



-W - 



Qt(x t ) 



(10) 



When A* = At, this choice of estimator for Zq/Z* is equivalent to the single time-slice 
estimator based on Jarzynski's equality.- 1 *^ 



B. Multiple time-slice estimators 

Eq. M only uses configuration data from a single time slice, t. For some observables 
A(x), a number of time slices may contain information relevant to the estimation of _E*L4]- 
To combine data from multiple time slices in a stable manner, we consider the multiple 
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importance sampling (MIS) strategy of Guibas and Veach.— >22 As in earlier work focusing 
on potentials of mean force,- this strategy is implemented by introducing a weight function 
a t (x), subject to the conditions a t (x) > for all x and t, and the constraint Ylt=o a t\ 



x ) 



1 



for all x. By introducing these weights into Eq. [7] through a factor of unity and applying 
Eq. [31 we obtain the identity, 




Y, E * [<*t{x)A{x)] 



t=o 



T 



t=0 



a t (x t ) A(x t ) e 



■wo-n 



q*{x t ) 
Qt(xt). 



(11) 



By replacing the above expectations with estimators, we obtain the general form of the MIS 
estimator for equilibrium expectations that uses multiple time slices from driven nonequi- 
librium processes, 



n * t=o 



0-s-t 



a t (x t )A(x t )e- w °^^p. 



(12) 



Eq. [12] can be seen as a generalized form of Eq. 8 from Oberhofer and Dellago,- applicable 
not only to potentials of mean force, but to arbitrary expectations. (Applications of this 
general form to quantities obtainable from single-molecule pulling experiments, including 
potentials of mean force, are described in section Hi Dl ) 



Every weighting function a t (x) that satisfies the above conditions results in an asymptot- 
ically consistent estimator that produces the true expectation in the limit of infinite data, 
but will have different properties for finite sample sizes. The choice cxt(x) = 5u t , where 5ij 
denotes the Kronecker delta and t* a designated time slice, recapitulates the single time- 
slice estimator of Eq. Another possibility is to weight all time slices equally by choosing 
a t (x) = (T+iy\ 



Zo 



7 t=0 



A(x t ) e 



Qt(x t ) 



(13) 



As in Eq. [TOl the choice A(x) = 1 leads to an estimator for the required ratio Z*/Z , 

Unfortunately, this choice is expected to perform poorly; time slices that carry little in- 
formation about £>L4-L because their instantaneous nonequilibrium densities pQ^ t '{x) = 
Eo^ t '[S(x — Xf)] differ greatly from ir*(x), are treated equally to those that carry the most. 
(This naive weighting scheme has previously been used to estimate a potential of mean 
force.—) 



-wo- 



(14) 



A more stable choice that makes better use of all time slices weights the contribution 
from configuration x according to its equilibrium probability: 



a t (x) 



T 

E K t >(x) 

f=0 



(15) 



This choice corresponds to the balance heuristic^ of MIS, and leads to the estimator 



Z* t=o 



Z t x q*{xt) 



T 

E z v 1( lt'{x t 

t'=0 



A(x t ) e 



Again, we obtain an estimator for the ratio Z*/Z by choosing A(x) = 1, 



z 



0-s-t 



t=0 



Z t 1 q*( x t) 

T 

E Z^qtixt) 

t'=0 



(16) 



(17) 



As a word of caution, we note that use of estimators for Z^/Zq other than Eq. [T7| (such as 
Eq. |T4"|) will lead to a violation of the imposed constraint that the estimated expectation of 
a constant function is a constant. In other words, Eq. [T7]must be used with Eq. [161 while 
Eq. [TH must be used with Eq. | 



C. Thermodynamic length 



One possible application of these estimators is the calculation of thermodynamic length, 
a natural measure of distance on the manifold of equilibrium thermodynamic states (see 
Ref. — for an excellent overview of the concept). Thermodynamic length is related to the 
heat dissipated during endoreversible processes,— and may prove useful in the optimization 
of fractional distillation,— the design of molecular motors,— and the selection of efficient 
data collection protocols.— 

The thermodynamic length of a continuous protocol A = X(t) in parameter space is 
defined as a path integral of the thermodynamic metric tensor Z A , 



dt A • Z A • A 



1/2 



(18) 



where A ■ Z A ■ A denotes a vector-tensor- vector inner product (in the case of multidimensional 
thermodynamic parameters A) and A denotes the time derivative of X(t). The metric tensor 
X\ is the Fisher information matrix— on the manifold of equilibrium thermodynamic states, 



(V A hi7r A ) (V A hi7r A ) 



where xy T denotes the outer product between vectors x and y. In most situations, the 
thermodynamic length cannot be computed directly; instead, it can be approximated by 
numerical quadrature using estimates of X A computed at discrete points along A(i). 

An alternative strategy is to compute the discrete-time analogue of the Fisher length, the 
Jensen-Shannon length,— 



T-l 



CjS = ^/Vjs(7lt(x),7l t +l(x)), 



(19) 



t=0 



which contains the Jensen- Shannon divergence,— 

1 



2 3 



In 



7T, 



-Eh 



In 



U^jix) + 7T fc (z)) 



(20) 



The Jensen- Shannon length satisfies Cjs < £, approaching equality as the step size 
decreases.- Estimators for the thermodynamic length based on the Jensen-Shannon di- 
vergence have been previously derived,- 1 ^ but not tested on any data, simulated or ex- 
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perimental. In this paper we compare estimators on a model of a single-molecule pulling 
experiment. 



D. Application to single-molecule pulling experiments 

Estimators of equilibrium ensemble averages from driven nonequilibrium processes are 
particularly relevant to single-molecule pulling experiments. Indeed, single-molecule force 
spectroscopy has been used to experimentally verify theorems relating nonequilibrium pro- 
cesses with equilibrium properties.— 1 ^ These theorems have also been applied to computing 
RNA folding free energies as a function of a control parameter.— Here, we specifically con- 
sider an experiment in which two polystyrene beads are attached to a polymer, such as a 
nucleic acid or protein. One bead is held at the origin, affixed to a micropipette, and the 
other is held in an optical trap centered about position z(t) along the z-axis. 

The total reduced potential of this system at inverse temperature /3 is described by 

u t (x) = u h (x) + v t (z(x)) 

where Ub(x) is the bare reduced potential, and 



is the harmonic biasing reduced potential with spring constant k s associated with the optical 
trap (e.g. Fig. [TJ. 

In the absence of the external harmonic biasing potential Vt(z), the potential of mean 
force (PMF) along the z-axis is given by 




(21) 




(22) 
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where 5g is an arbitrary constant. This PMF may be estimated using Eq. [161 leading to, 



-g b (z)+8g 



z b 



t=o 



0-s-t 



7-1 p -u b (x t ) 

-p-^ 6(z(x t ) - z) e- w ^< 

t'=0 



z[x t ) - z)e 



t'=0 



E ( | ) e-vW 



(23) 



Defining e 5s = Zq/Z^, we obtain precisely Eq. 8 from Hummer and Szabo^ and Eq. 9 from 
Minh and Adib4 

As properties of Eq. [23] have been examined in detail elsewhere,-^ here we concentrate 
on the comparison of estimators for other equilibrium averages: 



1. Free energies of the entire system, including the harmonic potential, Fq = — In (Z t /Z ); 



2. Moments of z about the mean, E t [(z — E t [z]) n ], for n — 1, 6; 



3. The metric tensor X t associated with thermodynamic length; 



for all t 6 {0, ...,T}. 

In the single-molecule pulling experiment considered here, the trap position is the only 
thermodynamic parameter which is varied, and hence the Fisher information matrix contains 
a single element, 

l t = ((3k s ) 2 E t [(z-E t [z}) 2 }, (24) 

which is proportional to the second central moment of the polymer extension, a quantity 
observable in single-molecule pulling experiments as well as computer simulations. Hence, 
all of these quantities may be estimated in both laboratory experiments and computer 
simulations. 
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III. ILLUSTRATIVE EXAMPLE 



As a model of a single-molecule pulling experiment, consider a one-dimensional double- 
well system with a bare (unbiased) reduced potential Ub(z) = (5z 3 — lOz + 3)z (Fig. [p. We 
perform overdamped Langevin (Brownian) dynamics simulations on this system as previ- 
ously described.- 1 ^ The total reduced potential u t (z) = Ub(z) + v t (z) includes a harmonic 
biasing potential v t (z) = ^(z — z(t)) 2 with reduced spring constant (/3k s ) = 15. The 
position is propagated using z t+ \ = z t — D(d/dx)u t (x t )At + (2DAt) 1 ^ 2 R t , where the dif- 
fusion coefficient is D — 1, the time step is At = 0.001, and Rt ~ iV(0, 1) is a sequence 
of independent, identically distributed random numbers drawn from the standard normal 
distribution. After equilibration at the initial z, 250 pulling trajectories were performed in 
both the forward (z(t) = — 1.5 + 0. 004t) and reverse (z(t) = +1.5 — 0.004t) directions for 750 
steps. Equilibrium ensemble averages were estimated in three ways: using only the forward 
trajectories (the forward experiment), only the reverse trajectories (the reverse experiment), 
or half of the trajectories from the forward and reverse ensembles (the bidirectional exper- 
iment). To assess the variance and bias of the estimates, independent experiments were 
replicated 2500 times and statistics accumulated to compare the bias and variance of the 
different estimators. Reference values of the free energies Fq, moments about the mean, 
the metric tensor X t , the Jensen-Shannon divergence T>js{nti ^t+i), and the thermodynamic 
length C, were numerically computed by adaptive Gauss-Kronrod quadrature using the 
quadgk method provided in MATLAB 7.10.0.499 (R2010a). 

As can be seen in the force-extension curves (Fig. EJ), the chosen pulling speed is sufficient 
to introduce significant hysteresis into the system. While approximately the same range of 
forces and extensions are sampled near the beginning and ends of the forward and reverse 
pulling simulations, the barrier-crossing forces in the forward direction are generally higher 
than in the reverse. If the pulling speed is increased, the extent of hysteresis increases. Con- 
versely, if it is decreased, forward and reverse trajectories (after appropriate time reversal) 
are less distinguishable (data not shown). This speed was chosen to be slow enough for esti- 
mates of equilibrium quantities to converge, but fast enough so that performance differences 
between unidirectional and bidirectional estimators are evident. 

The effect of hysteresis is also evident in the work histograms (Fig. [3]), which are fairly 
broad. Indeed, the extent of dissipation makes it difficult to estimate the free energy dif- 
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ference using Jarzynski's equality; with this example, the unidirectional estimates in both 
the forward and reverse directions are overestimated. According to the Crooks fluctuation 
theorem,— the work distribution in the forward direction and the negative work in the reverse 
direction should cross at the free energy difference. While this crossing point is difficult to 
pinpoint by examining histograms (as was done in several recent analyses of single-molecule 
pulling experiments), a free energy estimate based on the Bennett Acceptance Ratio^ 1 ^ is 
fairly accurate. 

We have considered the free energy estimates not only at the end points, but as a function 
of trap position (Fig. 2]). In these calculations, the performance (bias and variance) of the 
MIS estimator is not substantially different from the single time-slice estimator (Fig. H]). 
Unidirectional estimates, as previously noted,- 1 ^ are increasingly biased and have larger 
variance as the system is driven further away from equilibrium. Using multiple time slices 
does not alleviate this situation. Indeed, the MIS estimator with uniform weight, Eq. [Mj 
has a slightly greater bias and variance than the single time-slice estimator, Eq. [TUJ Using 
the balance heuristic, Eq. [T7J leads to an estimator with very similar performance. The 
similarity of the estimates is likely due to the high degree of correlation in the work values 
from sequential time slices. This does not preclude the possibility, however, that the multiple 
time-slice estimator will perform better than the single time-slice estimator in other systems. 

On the other hand, results from different methods of estimating moments about the mean 
are more distinct (Figs. [5]and|6]), and the disparities are larger for higher-order moments. 
(Because of the large bias in unidirectional estimates, we have only shown results from 
bidirectional estimates.) With the first moment, the bias and variance properties of the 
methods are quite similar, except for the MIS estimator with uniform weighting, which 
performs slightly worse (top right of Fig. As the average of a large number (e.g. 2500) of 
estimates is likely closer to the true value than any individual estimate, it is also informative 
to examine single estimates. In this case, the single time-slice estimator has fluctuations 
which are somewhat misleading, since the actual first-order moment varies smoothly with z. 
The MIS estimator with the balance heuristic, on the other hand, has a smoothness which 
more accurately reflects the true value. 

For second- and higher-order moments, the benefits of the MIS estimator are more pro- 
nounced. In addition to the previously observed trends in smoothness (left column of Figs. [5] 
and[6|), the variance of various estimators is significantly different (right column of Figs. [5] 
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and|6]). Compared to the single time-slice estimator, the MIS estimator with the balance 
heuristic has substantially reduced variance, especially at the extreme values of z. Results 
with uniform weighting are mixed, as performance is improved at the extreme values but 
is worse in the middle of the distribution. These results not only demonstrate the value of 
pooling information from multiple time slices, but in using a high-quality weighting proce- 
dure. 

Now consider thermodynamic length. Since the metric tensor for the thermodynamic 
length X t is directly proportional to the variance, the above comparison of single and multi- 
ple time-slice estimators holds; the MIS estimator with balance heuristic produces estimates 
with much smaller statistical error than the single time-slice estimator. An alternate strat- 
egy for measuring thermodynamic length, based on the Jensen- Shannon divergence, also 
merits comparison. Feng and Crooks noted that the Jensen-Shannon divergence between 
equilibrium probability distributions along the protocol of a driven nonequilibrium process 
is related to the sum of two path-ensemble averages,— 



— e~ w °^ In - 
Z t 1 



Zt+i 



g — Wt-H+l 



z 



_ e -w T ^ t+1 ln 



t+l 



\ _|_ ^L+lg— wt+i-n 



(25) 



where the first average is over the forward and the second average over the reverse process. 
As in many of the above expressions, using this equation requires estimates of the partition 



function ratios. Feng and Crooks suggested maximizing a log-likelihood 
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T-l 



E 

t=0 

T-l 

E 

t=0 



Nf 



J2^ e -^ t [x fn ] ln 



1 z t 

n=l 1 



N r 



I _|_ ?t q— wt^t.+i ]X f „] 
Zt+i 



+ 



1 



n=l Z t+1 



Wt + l^t[X rn \ 



(26) 



We find, however, that this estimator does not perform well. Starting with an estimate from 
Eq. [TU1 we maximize L using a steepest descent method. The norm of the gradient becomes 
nearly undetectable (less than 10~ 12 ) after only a few steps. Unfortunately, the resulting 
estimate has an unreasonably large change between the first two and last two time points: 
the ratios Z§jZ\ and Z T jZ T _\ are very small. This is because Z or Z T are present in every 



13 



sum of the expression and adjusting them has a disproportionate effect on the log-likelihood. 
Because the log-likelihood is always negative (the exponential is always positive and the log 
fermi function always negative), values of Z and that are very small maximize the 
log-likelihood. Convergence of Eq. [2^] would likely require an inordinate amount of data. 

Because of the poor performance of Eq. [261 we instead use the single time-slice estima- 
tor, Eq. [101 in estimating Jensen-Shannon divergences using Eq. |25j Each path-ensemble 
average in Eq. [251 may be estimated with a unidirectional (Eq. [5]) or bidirectional (Eq. [6]) es- 
timator. The performance of the bidirectional estimator is vastly superior (See Fig. [7]). The 
unidirectional estimator strongly deviates from the reference value of the Jensen-Shannon 
divergence, especially around —0.5 < z(t) < 0.5, with a mean and standard deviation that 
indicate extremely poor convergence. The cause of this poor convergence is likely similar 
to problems in unidirectional estimates from Jarzynski's equality:- 1 ^ rare events dominate 
estimates of exponential averages.— Bidirectional estimates, on the other hand, attain much 
closer agreement with the true value. While the Jensen-Shannon divergence and the variance 
are distinct quantities, they do bear considerable resemblance, and the performance of the 
bidirectional single time-slice estimator mirrors its performance in calculating the variance. 

Hence, we have bidirectional estimators that, for our model system, perform reasonably 
well in estimating the metric tensor and the Jensen-Shannon divergence. How do these 
estimators compare in computing the thermodynamic length? In Fig. [HI we compared some 
estimates of the thermodynamic length between states with the harmonic bias centered 
around z(t) = —1.5 and 750 values of z(t) up to z(t) = 1.5. Of the methods using the metric 
tensor, the MIS estimator using the balance heuristic, as expected, performs the best. The 
Jensen- Shannon length performs rather well but somewhat underestimates the thermody- 
namic length. This is not a problem with discretization. At this level of discretization, using 
the Jensen-Shannon length is very close to the thermodynamic length; indeed, it is superior 
than applying the trapezoidal rule to the metric tensor! (See Fig. [9]). 

Trends in these estimates can be seen more clearly by a histogram of thermodynamic 
length estimates between states with the harmonic bias centered around z(t) = —1.5 and 
1.5. (See Fig. [H]). In this histogram, it is clear that estimates of the Jensen-Shannon length 
based on Eq. [251 as we h as estimates of the thermodynamic length based on the single 
time-slice estimator of the metric tensor, do not perform as well as the MIS estimator for 
the metric tensor with the balance heuristic. While the former two methods exhibit similar 
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performance, the bias and variance of the latter estimator are substantially reduced. This 
improved performance reflects the aforementioned ability of the estimator to more accurately 
estimate the metric tensor at extreme values of z(t) (Fig. El middle). 

IV. DISCUSSION 

In this paper, we have described a stable method that generalizes previous estimators 
for potentials of mean forced to estimate arbitrary equilibrium expectations using multiple 
time slices from driven nonequilibrium processes. While the estimator is not asymptotically 
efficient, we find that in our demonstrative simulations, using the balance heuristic of MIS 
leads to smooth and robust estimates of several properties with less bias and variance than 
other discussed estimators. It is possible, however, that other choices of weights will lead to 
an estimator with even better properties. 

With MIS, a good weighting function is proportional to the density from which the 
data are sampled. Thus, for sampling from multiple equilibrium distributions, the balance 
heuristic is provably good.— ^ In driven nonequilibrium processes, however, samples from 
individual time slices are not drawn from the equilibrium density, but a nonequilibrium den- 
sity which is likely closer to an equilibrium distribution earlier in the process; as mentioned 
previously, driven processes are known to exhibit lag.— A weighting function which accounts 
for the lag could lead to an estimator with superior performance. This is a possible future 
research direction. 

Athenes and Marinica^ have proposed a different estimator that pools data from multiple 
time slices of a driven nonequilibrium processes to estimate equilibrium expectations. Their 
strategy entails using a Bayesian posterior (with an equilibrium prior) for the probability 
of observing a position during an entire trajectory. As their method was developed in the 
context of biased path sampling, it is not directly relevant to the situations described in this 
paper. A future comparison of the two strategies will likely require developing their strategy 
into a new estimator. 

We conclude by noting that the MIS strategy may not only be used for combining data 
from multiple time slices from driven nonequiliibrium processes, but for pooling data from 
both equilibrium and nonequilibrium data. We expect that this feature will be useful in the 
context of enhanced equilibrium sampling methods that use driven nonequilibrium processes 
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to generate trial moves for Monte Carlo simulations.— 1 ^ 
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FIGURE CAPTIONS 



FIG. 1. Model potential for a single-molecule pulling experiment. Left: Bare 
reduced potential Ub(z). Right: Total reduced potential, including the external harmonic 
biasing potential, u t {z) at ten different times spanning from z(t) = —1.5 (blue) to +1.5 
(red). Inset: same total reduced potential, zoomed in. Potential energies are shown in units 
of ksT. 

FIG. 2. Force-Extension Curves. Ten representative force-extension curves from forward 
(green) and reverse (red) pulling simulations. 

FIG. 3. Work histograms. Representative histograms of work performed in forward 
trajectories (green) and negative work in the reverse trajectories (red). The free energy 
difference between states with the harmonic trap at z(t) = —1.5 and z(t) = +1.5, computed 
by numerical quadrature, is shown as a thick dashed black line. Estimates of the free energy 
difference from 250 forward (green) or 250 reverse (red) using Jarzynski's equality^ 1 ^ or 
125 pulling simulations in each direction using the Bennett Acceptance Ratio (blue)^ 1 ^ are 
shown as lines alternating between dashed and dotted symbols. 

FIG. 4. Estimates of free energy differences. Representative estimates (left column) 
and the mean and standard deviation of 2500 estimates (error bars, right column) of Fq 
shown as a function of z(t) from —1.5 to +1.5. Estimates were computed with the single 
time-slice estimator, Eq. [10] (red circles), the MIS estimator with uniform weighting, Eq. [14] 
(green squares), and the MIS estimator with the balance heuristic, Eq. [17] (blue triangles), 
utilizing only 250 forward (top), only 250 reverse (middle), or 125 pulling simulations in 
each direction (bottom). For improved clarity, not all points are shown. The Fq computed 
by numerical quadrature is shown as a thick dashed black line. All free energies are shown 
in units of fc^T. 

FIG. 5. Estimates of moments of z about the mean. Representative estimates (left 
column) and the mean and standard deviation of 2500 estimates (error bars, right column) of 
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the first (top), second (middle), and third (bottom) central moments, E t [(z — E t [z]) n ], shown 
as a function of z(t) from —1.5 to +1.5. Estimates were computed with the single time-slice 
estimator, Eq. [TU] (red circles or error bars), the MIS estimator with uniform weighting, 
Eq. [H] (green squares or error bars), and the MIS estimator with the balance heuristic, 
Eq. H3 (blue triangles or error bars), utilizing 125 pulling simulations in each direction. The 
inset shows a closer view of the left tail of the mean and standard deviation, at a range 
between z(t) = — 1 and —0.5. For improved clarity, not all points are shown. Moments 
computed by numerical quadrature are shown as thick dashed black lines. 

FIG. 6. Estimates of moments of z about the mean. Representative estimates (left 
column) and the mean and standard deviation of 2500 estimates (error bars, right column) 
of the fourth (top), fifth (middle), and sixth (bottom) moments about the mean. Otherwise, 
the caption in Fig. [5] applies here. 

FIG. 7. Estimates of the Jensen-Shannon divergence. Representative estimates (left 
column) and the mean and standard deviation of 2500 estimates (error bars, right column) 
of the Jensen- Shannon divergence, Djs{^t-, TTt+i), shown as a function of z(t) from —1.5 to 
+1.5. Estimates were computed with the unidirectional, Eq. [5] (top), or bidirectional, Eq. El 
estimator for the path-averages in Eq. [25] utilizing 125 pulling simulations in each direction. 
For improved clarity, not all points are shown. The value of Djsi^t, ^t+i) computed by 
numerical quadrature is shown as a thick dashed black line. 

FIG. 8. Estimates of the thermodynamic length. Representative estimates (left col- 
umn) and the mean and standard deviation of 2500 estimates (error bars, right column) of the 
thermodynamic length between states with the harmonic bias centered around z(t) = — 1.5 
and 750 values of z(t) (x-axis) up to z(t) = 1.5, with each estimate based on 125 pulling 
simulations in both directions. Estimates were either made using the trapezoidal rule with 
the metric tensor (top) or the Jensen- Shannon length (bottom). Estimates of the metric 
tensor were computed with the single time-slice estimator, Eq. [10] (red circles or error bars), 
the MIS estimator with uniform weighting, Eq. [TU (green squares or error bars), and the 
MIS estimator with the balance heuristic, Eq. [17] (blue triangles or error bars). The Jensen- 
Shannon divergence was estimated with the bidirectional estimator, Eq. [6] to compute the 
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path-averages in Eq. [251 an d length estimated with Eq. [191 For improved clarity, not all 
points are shown. The thick black line shows the value of the thermodynamic length based 
on Gauss-Kronrod quadrature. 

FIG. 9. Estimates of the total thermodynamic length. Histogram of estimates of 
the thermodynamic length from 2500 independent realizations of the pulling experiment 
using the single time-slice estimator, Eq. [10] (top) or the MIS estimator with the balance 
heuristic, Eq. [T7] (middle), to compute the metric tensor. The thermodynamic length L 
is then estimated using the trapezoidal rule. For the histogram in the bottom panel, the 
Jensen- Shannon divergence is estimated with the bidirectional estimator, Eq. El to compute 
the path-averages in Eq. [251 an d the thermodynamic length estimated with Eq. [191 All 
simulations utilized 125 pulling simulations in each direction. The thick black line shows 
the value of the thermodynamic length based on Gauss-Kronrod quadrature. In the top 
two panels, the green line shows the thermodynamic length estimated based integrating the 
metric tensor at 750 points using Gauss-Kronrod quadrature and applying the trapezoidal 
rule to compute the length. In the bottom panel, the green line is the Jensen-Shannon 
length. 
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FIGURES 




FIG. 1. Model potential for a single-molecule pulling experiment. Left: Bare reduced 
potential Ub(z). Right: Total reduced potential, including the external harmonic biasing potential, 
ut(z) at ten different times spanning from z(t) = —1.5 (blue) to +1.5 (red). Inset: same total 
reduced potential, zoomed in. Potential energies are shown in units of &bT. 




Extension 

FIG. 2. Force-Extension Curves. Ten representative force-extension curves from forward 
(green) and reverse (red) pulling simulations. 
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FIG. 3. Work histograms. Representative histograms of work performed in forward trajectories 
(green) and negative work in the reverse trajectories (red). The free energy difference between 
states with the harmonic trap at z(t) = —1.5 and z{t) = +1.5, computed by numerical quadrature, 
is shown as a thick dashed black line. Estimates of the free energy difference from 250 forward 
(green) or 250 reverse (red) using Jarzynski's equality^ 1 ^ or 125 pulling simulations in each 
direction using the Bennett Acceptance Ratio (blue)^ 1 ^ are shown as lines alternating between 
dashed and dotted symbols. 
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FIG. 4. Estimates of free energy differences. Representative estimates (left column) and 
the mean and standard deviation of 2500 estimates (error bars, right column) of Fq shown as a 
function of z(t) from —1.5 to +1.5. Estimates were computed with the single time-slice estimator, 
Eq. [TO] (red circles), the MIS estimator with uniform weighting, Eq. Q3] (green squares), and the 
MIS estimator with the balance heuristic, Eq. [T7](blue triangles), utilizing only 250 forward (top), 
only 250 reverse (middle), or 125 pulling simulations in each direction (bottom). For improved 
clarity, not all points are shown. The Fq computed by numerical quadrature is shown as a thick 
dashed black line. All free energies are shown in units of ksT. 
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FIG. 5. Estimates of moments of z about the mean. Representative estimates (left column) 
and the mean and standard deviation of 2500 estimates (error bars, right column) of the first (top), 
second (middle), and third (bottom) central moments, Et[(z — Et[z]) n ], shown as a function of z(t) 
from —1.5 to +1.5. Estimates were computed with the single time-slice estimator, Eq. [10] (red 
circles or error bars), the MIS estimator with uniform weighting, Eq. Q3] (green squares or error 
bars), and the MIS estimator with the balance heuristic, Eq. [T7] (blue triangles or error bars), 
utilizing 125 pulling simulations in each direction. The inset shows a closer view of the left tail of 
the mean and standard deviation, at a range between z(t) = — 1 and —0.5. For improved clarity, 
not all points are shown. Moments computed by numerical quadrature are shown as thick dashed 
black lines. 
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FIG. 6. Estimates of moments of z about the mean. Representative estimates (left column) 
and the mean and standard deviation of 2500 estimates (error bars, right column) of the fourth 
(top), fifth (middle), and sixth (bottom) moments about the mean. Otherwise, the caption in 
Fig. [5] applies here. 
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FIG. 7. Estimates of the Jensen-Shannon divergence. Representative estimates (left column) 
and the mean and standard deviation of 2500 estimates (error bars, right column) of the Jensen- 
Shannon divergence, Djs(TTt,^t+i), shown as a function of z(t) from —1.5 to +1.5. Estimates 
were computed with the unidirectional, Eq. [5] (top), or bidirectional, Eq. [6l estimator for the path- 
averages in Eq. [25] utilizing 125 pulling simulations in each direction. For improved clarity, not all 
points are shown. The value of Djs{^t-,^t+i) computed by numerical quadrature is shown as a 
thick dashed black line. 
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FIG. 8. Estimates of the thermodynamic length. Representative estimates (left column) 
and the mean and standard deviation of 2500 estimates (error bars, right column) of the ther- 
modynamic length between states with the harmonic bias centered around z{t) = —1.5 and 750 
values of z(t) (x-axis) up to z{t) = 1.5, with each estimate based on 125 pulling simulations in 
both directions. Estimates were either made using the trapezoidal rule with the metric tensor 
(top) or the Jensen-Shannon length (bottom). Estimates of the metric tensor were computed with 
the single time-slice estimator, Eq. [10] (red circles or error bars), the MIS estimator with uniform 
weighting, Eq. Q3] (green squares or error bars) , and the MIS estimator with the balance heuris- 
tic, Eq. [T7] (blue triangles or error bars). The Jensen-Shannon divergence was estimated with the 
bidirectional estimator, Eq. [6l to compute the path-averages in Eq. [25] and length estimated with 
Eq. [19j For improved clarity, not all points are shown. The thick black line shows the value of the 
thermodynamic length based on Gauss-Kronrod quadrature. 
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FIG. 9. Estimates of the total thermodynamic length. Histogram of estimates of the 
thermodynamic length from 2500 independent realizations of the pulling experiment using the 
single time-slice estimator, Eq. [10] (top) or the MIS estimator with the balance heuristic, Eq. [T7] 
(middle), to compute the metric tensor. The thermodynamic length C is then estimated using 
the trapezoidal rule. For the histogram in the bottom panel, the Jensen-Shannon divergence is 
estimated with the bidirectional estimator, Eq. to compute the path-averages in Eq. |2"5| and 
the thermodynamic length estimated with Eq. [191 All simulations utilized 125 pulling simulations 
in each direction. The thick black line shows the value of the thermodynamic length based on 
Gauss-Kronrod quadrature. In the top two panels, the green line shows the thermodynamic length 
estimated based integrating the metric tensor at 750 points using Gauss-Kronrod quadrature and 
applying the trapezoidal rule to compute the length. In the bottom panel, the green line is the 
Jensen-Shannon length. 
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